Add support for downloading ERA5 pressure levels data product (new)#197
Merged
Add support for downloading ERA5 pressure levels data product (new)#197
Conversation
…ressureLevels
- Updated exports to include ERA5HourlyPressureLevels, ERA5MonthlyPressureLevels, ERA5_all_pressure_levels, pressure_field
- Added using Statistics, Oceananigans.Fields.CenterField/interior, Oceananigans.BoundaryConditions.fill_halo_regions!, native_grid, InverseGravity to imports
- Extended import block to include is_three_dimensional, reversed_vertical_axis, conversion_units
- Added ERA5_all_pressure_levels constant (37 standard hPa levels)
- Added ERA5PressureDataset, ERA5HourlyPressureLevels, ERA5MonthlyPressureLevels with keyword constructors
- Added ERA5PressureMetadata{D} and ERA5PressureMetadatum type aliases
- Added Base.size, all_dates, is_three_dimensional, reversed_vertical_axis dispatches
- Added ERA5PL_dataset_variable_names and ERA5PL_netcdf_variable_names dicts (15 variables each)
- Added available_variables, dataset_variable_name, netcdf_variable_name, conversion_units dispatches
- Added retrieve_data(::ERA5PressureMetadatum) — reads 4D NetCDF, reverses vertical axis
- Added metadata_prefix(::ERA5PressureMetadata) — uses ERA5PL_dataset_variable_names for filename construction
- Added _std_atm_geopotential_height, _std_atm_z_interfaces, z_interfaces(::ERA5PressureMetadata)
- Added pressure_field and mean_geopotential_heights
…s, without explicitly defining metadata
Combined CDSAPI requests will return a single combined netcdf; set cleanup=false to keep the "_tmp_multi_<datetime>_*" files
Note: inpainting is now turned OFF for 2D data -- the reanalysis data should be complete and turning on inpainting would artificially fill in data that should be masked (e.g., ocean quantities over land) Tested for 2D (single levels) and 3D (pressure levels)
to disambiguate from ERA5PressureLevels*
Clipped field will match downloaded bounding box, tested with ERA5
rename levels -> pressure_levels for clarity
…to eq/era5_pressure_levels
…to eq/era5_pressure_levels
Per @glwagner's review: a `Field{Nothing, Nothing, Center}` represents the same per-level pressure values as the previous `CenterField` but without copying across the full horizontal grid. Use `set!` for the column assignment, drop the per-k loop, and let the field's eltype follow the grid (no more hardcoded `Float32`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface-level testsets each downloaded `2m_temperature` then removed it, forcing the next testset to re-download the same bytes — three CDS round-trips for one fixture. Keep the pre-clean only in the testset that's testing the download path itself; let downstream testsets reuse the file and run cleanup in the last consumer. Pressure-level "Geopotential height conversion" pre-cleaned the `geopotential_...nc` that the previous testset's `z_interfaces` side-effect leaves on disk, then re-downloaded it. Drop the pre-clean and the redundant explicit `download_dataset(meta_z)` call (Field() already downloads if needed). Net: surface round-trips drop 3 → 1, geopotential round-trips drop 2 → 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
- Add `build_era5_area` methods for `Column` (Linear and Nearest) so
`FieldTimeSeries(Metadata(...; region=Column))` no longer hits MethodError.
Linear pads ε=0.3° (slightly more than ERA5's 0.25° native spacing) so the
downloaded file contains the 2x2 stencil bilinear interpolation needs;
Nearest uses ε=1e-3°.
- Realign `dataset_variable_name(::ERA5*Metadata)` to return the in-file
short name (e.g. "u") instead of the CDS API catalog name
("u_component_of_wind"), matching the docstring ("the name used for the
variable in its raw dataset file"). The CDS API name is still accessed
via the `*_dataset_variable_names` dict directly in CDSAPIExt. Drops
the now-redundant `netcdf_variable_name` methods. This fixes
`column_field_from_file` for ERA5 — it calls `dataset_variable_name`
generically and previously got the wrong name back.
- Validate cached file's vertical extent in `column_field_from_file` and
`mean_geopotential_heights`. A stale cache from a previous run with
different `pressure_levels` previously produced silent NaN data or a
cryptic broadcast DimensionMismatch; now throws a clear actionable error.
- Tighten warning text in `z_interfaces` ("Failed to derive geopotential
heights" rather than "Failed to download") and attach `catch_backtrace`
so the underlying cause is visible in the warning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new top-level testsets exercise internals of `NumericalEarthCDSAPIExt`
that were previously uncovered. All tests are hermetic (no CDS API, no
filesystem dependencies beyond `mktempdir`).
- "ERA5 CDSAPIExt dispatch helpers and area construction":
* `cds_product`, `cds_varnames`, `nc_varnames`, `coord_vars` for
single-level and pressure-level datasets;
* `extra_request_keys!` (no-op for single level, populates
`pressure_level` for pressure-level datasets);
* `build_era5_area` for `Nothing`, `BoundingBox` (both axes set,
one axis missing), `Column{Linear}`, `Column{Nearest}`.
- "ERA5 CDSAPIExt NetCDF copy and split helpers":
* `ncvar_copy!` round-trips data, attributes, and fill values;
* `ncvar_copy_tslice!` correctly handles both time-dependent and
time-independent variables;
* `split_era5_nc` and `split_era5_nc_multistep` produce one output
file per (variable[, timestep]) request, skipping variables not
present in the source.
Synthetic ERA5-shaped NetCDFs are written via NCDatasets and discarded
with `mktempdir`; the extension's private symbols are reached through
`Base.get_extension`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…to eq/era5_pressure_levels
glwagner
reviewed
Apr 30, 2026
glwagner
reviewed
Apr 30, 2026
glwagner
reviewed
Apr 30, 2026
glwagner
reviewed
Apr 30, 2026
Member
glwagner
left a comment
There was a problem hiding this comment.
Should we combine the ERA5 data demos into a single example? And add to docs? Generally we should not have examples that aren't in the docs (I believe there are still some orphaned right now, but we need to clean that up)
glwagner
reviewed
Apr 30, 2026
glwagner
reviewed
Apr 30, 2026
Cover the parts of `src/DataWrangling/ERA5/ERA5_single_levels.jl` that the integration tests don't exercise directly: hourly `all_dates` step (mirrors the existing monthly test), the API/netcdf variable-name dicts staying in sync, the `available_variables` / `dataset_variable_name` dispatch (catches the easy swap between API catalog name and netcdf short name), `default_inpainting` returning `nothing` (the wrong value here silently makes Field construction expensive), and `metadata_prefix` filename construction across the single-date / multi-date / no-region branches plus filename-safety transformations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract `_group_by_calendar_day(datetimes)` from the inline comprehensions in two `download_dataset` overloads so the grouping logic is testable in isolation. Test boundary cases: - 00:00 belongs to its own day, not the previous one - multi-day interleaved input - duplicate datetimes are preserved - single-element input Also add tests for the `skip_existing=true` short-circuit in three multi-file paths (multi-variable pressure-level, single-variable multi-date, multi-variable multi-date). Pre-create the expected output files in a tempdir and assert each path returns without invoking CDSAPI; if the short-circuit ever regresses the test will throw a credentials/network error and fail loudly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…to eq/era5_pressure_levels
…ls, and CDSAPI ext Cover several pure functions and dispatch overloads that the integration tests either skip or don't exercise directly. Each test is no-network and self-contained. `metadata_field.jl`: - `restrict` (BoundingBox grid construction): identity, half-domain, small bbox, off-origin bbox, and the `Nothing` pass-through dispatch. - `restrict_location` for all three region kinds (`BoundingBox` / `Nothing` / `Column`), confirming the Column path reduces horizontal locations to `Nothing`. `ERA5_pressure_levels.jl`: - Constructors sort levels descending: pass ASCENDING input (`[500, 850]hPa`) so the test fails if `sort(...; rev=true)` regresses to a no-op or different order. Covered for both Hourly and Monthly variants. - `stagger`: pure function that converts ascending centers to Nz+1 staggered interfaces. Covers two-element evenly-spaced, three-element evenly-spaced, and three-element irregular cases (verifies extrapolation formula at top/bot and midpoint formula in the interior). `NumericalEarthCDSAPIExt.jl`: - `is_zip`: ZIP magic header detected, non-magic bytes rejected, short (<4 byte) files rejected. - `foreach_nc`: non-zip path calls `f` exactly once with the input path; zip path extracts and visits each `.nc`, ignoring non-`.nc` entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
glwagner
reviewed
Apr 30, 2026
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
glwagner
approved these changes
Apr 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This replaces #93. All previous comments and suggestions have already been addressed — the difference is that this new PR originates from this repo rather than my fork. The reason for opening a new PR is that the CI from my fork was failing due to empty credentials.
Closes #88.